Power Lines with ibis

Author

Dheepak Krishnamurthy

Published

November 30, 2024

Keywords

python, ibis, folium, geospatial, power lines, duckdb

This is a reproduction of the post on the official ibis website with minor modifications.

I like to add the following to the top of all my notebooks.

%reload_ext autoreload
%autoreload 2

from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "last_expr_or_assign"

Install

The packages can be installed with uv sync command if you use the lock file in the source directory.

uv sync

Alternatively, you can install the packages with the following commands.

uv add 'ibis-framework[duckdb,geospatial]' folium geopandas

Imports

import ibis
from ibis import _
ibis.options.interactive = True

Download the data

con = ibis.duckdb.connect(extensions=["spatial"])
<ibis.backends.duckdb.Backend at 0x348046570>
import os

if not os.path.exists("power-infra-usa.geoparquet"):
    # look into type infrastructure
    url = "s3://overturemaps-us-west-2/release/2024-07-22.0/theme=base/type=infrastructure/*"
    t = con.read_parquet(url, table_name="infra-usa")

    # filter for USA bounding box, subtype="power", and selecting only few columns
    expr = t.filter(
        _.bbox.xmin > -125.0,
        _.bbox.ymin > 24.8,
        _.bbox.xmax < -65.8,
        _.bbox.ymax < 49.2,
        _.subtype == "power",
    ).select(["names", "geometry", "bbox", "class", "sources", "source_tags"])

    con.to_parquet(expr, "power-infra-usa.geoparquet")

None

Filter and destructure

With ibis we can read the power infrastructure data.

usa_power_infra = con.read_parquet("power-infra-usa.geoparquet")
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ names  geometry                                                                          bbox                                                                class        sources                                                                           source_tags                                  ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stru…geospatial:geometrystruct<xmin: float32, xmax: float32, ymin: float32, ymax: float32>stringarray<struct<property: string, dataset: string, record_id: string, update_time:…map<string, string>                          │
├───────┼──────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────┼─────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────┤
│ NULL<POINT (-114.291 27.151)>{'xmin': -114.29095458984375, 'xmax': -114.29093933105469, ... +2}power_tower[{...}]{}                                           │
│ NULL<POINT (-114.289 27.149)>{'xmin': -114.28852844238281, 'xmax': -114.28851318359375, ... +2}power_tower[{...}]{}                                           │
│ NULL<POINT (-114.29 27.15)>{'xmin': -114.29006958007812, 'xmax': -114.29005432128906, ... +2}power_tower[{...}]{}                                           │
│ NULL<POINT (-114.29 27.151)>{'xmin': -114.2900619506836, 'xmax': -114.29004669189453, ... +2}power_tower[{...}]{}                                           │
│ NULL<POLYGON ((-114.29 27.152, -114.29 27.151, -114.29 27.15, -114.291 27.151, -...>{'xmin': -114.29100036621094, 'xmax': -114.28948974609375, ... +2}substation [{...}]{'location': 'outdoor', 'voltage': '115000'} │
│ NULL<POINT (-114.291 27.151)>{'xmin': -114.29055786132812, 'xmax': -114.29054260253906, ... +2}portal     [{...}]{}                                           │
│ NULL<POINT (-114.291 27.152)>{'xmin': -114.29069519042969, 'xmax': -114.29067993164062, ... +2}power_tower[{...}]{}                                           │
│ NULL<POINT (-114.288 27.151)>{'xmin': -114.28756713867188, 'xmax': -114.28755187988281, ... +2}power_tower[{...}]{}                                           │
│ NULL<POINT (-114.286 27.152)>{'xmin': -114.28639221191406, 'xmax': -114.286376953125, ... +2}power_tower[{...}]{}                                           │
│ NULL<POINT (-114.285 27.152)>{'xmin': -114.2851333618164, 'xmax': -114.28511810302734, ... +2}power_tower[{...}]{}                                           │
│                                             │
└───────┴──────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────┴─────────────┴──────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────┘

Here’s all the types of infrastructure in the data:

usa_power_infra.rename(infrastructure_types="class").select(
    _.infrastructure_types
).distinct().execute()
infrastructure_types
0 transformer
1 generator
2 terminal
3 power_pole
4 power_tower
5 plant
6 switch
7 power_line
8 substation
9 portal
10 cable
11 catenary_mast
12 cable_distribution
13 minor_line
14 connection
15 insulator

We can get all the power lines using this filter:

(usa_power_infra.filter(_["class"] == "power_line"))
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ names  geometry                                                                          bbox                                                                class       sources                                                                           source_tags                              ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ stru…geospatial:geometrystruct<xmin: float32, xmax: float32, ymin: float32, ymax: float32>stringarray<struct<property: string, dataset: string, record_id: string, update_time:…map<string, string>                      │
├───────┼──────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────┼────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────┤
│ NULL<LINESTRING (-115.191 28.056, -115.192 28.057, -115.192 28.058, -115.193 28....>{'xmin': -115.19832611083984, 'xmax': -115.19091796875, ... +2}power_line[{...}]{}                                       │
│ NULL<LINESTRING (-113.414 27.641, -113.414 27.642, -113.416 27.642, -113.418 27....>{'xmin': -114.29095458984375, 'xmax': -113.41442108154297, ... +2}power_line[{...}]{'frequency': '60', 'voltage': '115000'} │
│ NULL<LINESTRING (-113.415 27.641, -113.415 27.642, -113.416 27.642, -113.416 27....>{'xmin': -113.41552734375, 'xmax': -112.40901184082031, ... +2}power_line[{...}]{'frequency': '60', 'voltage': '115000'} │
│ NULL<LINESTRING (-118.433 33.353, -118.432 33.352, -118.431 33.352, -118.431 33....>{'xmin': -118.43326568603516, 'xmax': -118.42095947265625, ... +2}power_line[{...}]{}                                       │
│ NULL<LINESTRING (-118.489 33.434, -118.489 33.434, -118.489 33.433, -118.486 33....>{'xmin': -118.48948669433594, 'xmax': -118.31026458740234, ... +2}power_line[{...}]{}                                       │
│ NULL<LINESTRING (-118.337 33.341, -118.336 33.341, -118.336 33.341, -118.335 33....>{'xmin': -118.33689880371094, 'xmax': -118.33036804199219, ... +2}power_line[{...}]{}                                       │
│ NULL<LINESTRING (-118.345 33.347, -118.342 33.346, -118.342 33.345, -118.339 33....>{'xmin': -118.34489440917969, 'xmax': -118.31033325195312, ... +2}power_line[{...}]{}                                       │
│ NULL<LINESTRING (-118.335 33.341, -118.335 33.341, -118.335 33.342, -118.335 33....>{'xmin': -118.33499145507812, 'xmax': -118.33025360107422, ... +2}power_line[{...}]{}                                       │
│ NULL<LINESTRING (-118.346 33.347, -118.347 33.348, -118.349 33.349, -118.352 33....>{'xmin': -118.38180541992188, 'xmax': -118.34550476074219, ... +2}power_line[{...}]{}                                       │
│ NULL<LINESTRING (-118.483 33.435, -118.483 33.436, -118.484 33.436, -118.484 33....>{'xmin': -118.49791717529297, 'xmax': -118.48307800292969, ... +2}power_line[{...}]{}                                       │
│                                         │
└───────┴──────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────┴────────────┴──────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────┘

This is what ibis does under the hood using sql and duckdb to extract data from the parquet file:

Code
ibis.to_sql(usa_power_infra.filter(_["class"] == "power_line"))
SELECT
  *
  REPLACE (ST_ASWKB("geometry") AS "geometry")
FROM (
  SELECT
    *
  FROM "ibis_read_parquet_fkt2iu36mreyfkelc4xed54ezy" AS "t0"
  WHERE
    "t0"."class" = 'power_line'
)

The source_tags column contains voltage information for some lines:

(usa_power_infra.filter(_["class"] == "power_line").select(_.source_tags).distinct())
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ source_tags                                         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ map<string, string>                                 │
├─────────────────────────────────────────────────────┤
│ {'frequency': '60', 'voltage': '500000;230000'}     │
│ {'frequency': '60', 'voltage': '46000;46000;13800'} │
│ {'frequency': '60', 'voltage': '138000;46000'}      │
│ {'frequency': '60', 'voltage': '138000;115000'}     │
│ {'voltage': '345000;345000'}                        │
│ {'voltage': '400000'}                               │
│ {'voltage': '115000;69000'}                         │
│ {'voltage': '115000', 'ref': 'EPE'}                 │
│ {'frequency': '60', 'voltage': '22000'}             │
│ {'voltage': '6'}                                    │
│                                                    │
└─────────────────────────────────────────────────────┘

You can see that source_tags is a map (or record or dictionary) that contains voltage information. Let’s extract just the first voltage value we can find into new column called voltage:

power_lines = (
    usa_power_infra.filter(_["class"] == "power_line")
    .filter(_.source_tags["voltage"] != ibis.null())
    .mutate(
        voltage=_.source_tags["voltage"]
        .split(";")[0]
        .split(":")[0]
        .split("/")[0]
        .split("'")[0]
        .split(",")[0]
        .split("=")[0]
        .try_cast("int64")
    )
    .filter([_.voltage.notnull(), _.voltage > 0.0])
)
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ names  geometry                                                                          bbox                                                                class       sources                                                                           source_tags                               voltage ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ stru…geospatial:geometrystruct<xmin: float32, xmax: float32, ymin: float32, ymax: float32>stringarray<struct<property: string, dataset: string, record_id: string, update_time:…map<string, string>int64   │
├───────┼──────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────┼────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────┼─────────┤
│ NULL<LINESTRING (-113.414 27.641, -113.414 27.642, -113.416 27.642, -113.418 27....>{'xmin': -114.29095458984375, 'xmax': -113.41442108154297, ... +2}power_line[{...}]{'frequency': '60', 'voltage': '115000'}115000 │
│ NULL<LINESTRING (-113.415 27.641, -113.415 27.642, -113.416 27.642, -113.416 27....>{'xmin': -113.41552734375, 'xmax': -112.40901184082031, ... +2}power_line[{...}]{'frequency': '60', 'voltage': '115000'}115000 │
│ NULL<LINESTRING (-115.987 30.715, -115.986 30.716, -115.986 30.715, -115.986 30....>{'xmin': -115.98729705810547, 'xmax': -115.90068054199219, ... +2}power_line[{...}]{'frequency': '60', 'voltage': '115000'}115000 │
│ NULL<LINESTRING (-112.692 29.91, -112.693 29.91, -112.693 29.91, -112.693 29.909)>{'xmin': -112.69328308105469, 'xmax': -112.69247436523438, ... +2}power_line[{...}]{'frequency': '60', 'voltage': '230000'}230000 │
│ NULL<LINESTRING (-112.691 29.909, -112.692 29.909, -112.692 29.909, -112.692 29....>{'xmin': -112.69207000732422, 'xmax': -112.69055938720703, ... +2}power_line[{...}]{'frequency': '60', 'voltage': '230000'}230000 │
│ NULL<LINESTRING (-112.692 29.908, -112.691 29.908, -112.691 29.908, -112.691 29....>{'xmin': -112.69156646728516, 'xmax': -112.69055938720703, ... +2}power_line[{...}]{'frequency': '60', 'voltage': '230000'}230000 │
│ NULL<LINESTRING (-112.692 29.909, -112.692 29.909, -112.692 29.909, -112.691 29.91)>{'xmin': -112.69243621826172, 'xmax': -112.69139099121094, ... +2}power_line[{...}]{'frequency': '60', 'voltage': '230000'}230000 │
│ NULL<LINESTRING (-112.691 29.91, -112.692 29.909, -112.693 29.909, -112.693 29.909)>{'xmin': -112.69294738769531, 'xmax': -112.69139099121094, ... +2}power_line[{...}]{'frequency': '60', 'voltage': '230000'}230000 │
│ NULL<LINESTRING (-112.648 29.904, -112.649 29.904, -112.651 29.905, -112.654 29....>{'xmin': -112.6930160522461, 'xmax': -112.64823913574219, ... +2}power_line[{...}]{'frequency': '60', 'voltage': '230000'}230000 │
│ NULL<LINESTRING (-112.691 29.909, -112.689 29.911, -112.688 29.914, -112.686 29....>{'xmin': -112.6905746459961, 'xmax': -112.65251922607422, ... +2}power_line[{...}]{'frequency': '60', 'voltage': '230000'}230000 │
│  │
└───────┴──────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────┴────────────┴──────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────┴─────────┘

And then finally we can filter only the power lines with a voltage greater than 230 kV:

power_lines_subset = power_lines.filter(_.voltage > 230e3).select(_.geometry)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ geometry                                                                         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ geospatial:geometry                                                              │
├──────────────────────────────────────────────────────────────────────────────────┤
│ <LINESTRING (-117.169 33.738, -117.169 33.738, -117.168 33.739, -117.168 33....> │
│ <LINESTRING (-117.168 33.739, -117.168 33.739, -117.168 33.739, -117.168 33....> │
│ <LINESTRING (-117.169 33.739, -117.169 33.739, -117.168 33.739, -117.168 33....> │
│ <LINESTRING (-117.158 33.738, -117.158 33.738)>                                  │
│ <LINESTRING (-117.158 33.738, -117.158 33.738, -117.157 33.738, -117.157 33....> │
│ <LINESTRING (-117.156 33.738, -117.156 33.738, -117.156 33.738, -117.157 33....> │
│ <LINESTRING (-117.158 33.738, -117.158 33.738, -117.158 33.739, -117.158 33....> │
│ <LINESTRING (-117.158 33.74, -117.158 33.739, -117.158 33.739, -117.158 33.7...> │
│ <LINESTRING (-117.158 33.738, -117.158 33.738, -117.158 33.739, -117.158 33....> │
│ <LINESTRING (-117.157 33.74, -117.157 33.739, -117.157 33.739, -117.157 33.7...> │
│                                                                                 │
└──────────────────────────────────────────────────────────────────────────────────┘

Visualize

import geopandas as gpd
gdf = gpd.GeoDataFrame(power_lines_subset.execute())
geometry
0 LINESTRING (-117.16902 33.7385, -117.1689 33.7...
1 LINESTRING (-117.16776 33.73928, -117.16776 33...
2 LINESTRING (-117.16903 33.73936, -117.1689 33....
3 LINESTRING (-117.15813 33.73814, -117.15784 33...
4 LINESTRING (-117.15784 33.73814, -117.15755 33...
... ...
22537 LINESTRING (-65.92864 47.88371, -65.92859 47.8...
22538 LINESTRING (-65.92816 47.88421, -65.92831 47.8...
22539 LINESTRING (-65.92916 47.88385, -65.92911 47.8...
22540 LINESTRING (-65.92904 47.88405, -65.92879 47.8...
22541 LINESTRING (-65.92847 47.88501, -65.92799 47.8...

22542 rows × 1 columns

import folium

m = folium.Map(location=[41.5435959, -99.8396373], zoom_start=5)

folium.GeoJson(gdf.to_json(), name="Power Lines").add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

Raw SQL

Here’s the raw SQL ibis generates:

Here’s the raw sql for the full query:

ibis.to_sql(power_lines_subset)
SELECT
  *
  REPLACE (ST_ASWKB("geometry") AS "geometry")
FROM (
  SELECT
    "t1"."geometry"
  FROM (
    SELECT
      "t0"."names",
      "t0"."geometry",
      "t0"."bbox",
      "t0"."class",
      "t0"."sources",
      "t0"."source_tags",
      TRY_CAST(LIST_EXTRACT(
        STR_SPLIT(
          LIST_EXTRACT(
            STR_SPLIT(
              LIST_EXTRACT(
                STR_SPLIT(
                  LIST_EXTRACT(
                    STR_SPLIT(
                      LIST_EXTRACT(
                        STR_SPLIT(
                          LIST_EXTRACT(
                            STR_SPLIT(
                              CASE
                                WHEN "t0"."source_tags" IS NULL
                                THEN NULL
                                ELSE COALESCE(
                                  LIST_EXTRACT(
                                    CASE
                                      WHEN 'voltage' IS NULL
                                      THEN NULL
                                      ELSE ELEMENT_AT("t0"."source_tags", 'voltage')
                                    END,
                                    1
                                  ),
                                  NULL
                                )
                              END,
                              ';'
                            ),
                            0 + CAST(0 >= 0 AS TINYINT)
                          ),
                          ':'
                        ),
                        0 + CAST(0 >= 0 AS TINYINT)
                      ),
                      '/'
                    ),
                    0 + CAST(0 >= 0 AS TINYINT)
                  ),
                  ''''
                ),
                0 + CAST(0 >= 0 AS TINYINT)
              ),
              ','
            ),
            0 + CAST(0 >= 0 AS TINYINT)
          ),
          '='
        ),
        0 + CAST(0 >= 0 AS TINYINT)
      ) AS BIGINT) AS "voltage"
    FROM "ibis_read_parquet_fkt2iu36mreyfkelc4xed54ezy" AS "t0"
    WHERE
      "t0"."class" = 'power_line'
      AND CASE
        WHEN "t0"."source_tags" IS NULL
        THEN NULL
        ELSE COALESCE(
          LIST_EXTRACT(
            CASE
              WHEN 'voltage' IS NULL
              THEN NULL
              ELSE ELEMENT_AT("t0"."source_tags", 'voltage')
            END,
            1
          ),
          NULL
        )
      END IS NOT NULL
  ) AS "t1"
  WHERE
    "t1"."voltage" IS NOT NULL AND "t1"."voltage" > 0.0 AND "t1"."voltage" > 230000.0
)

Reuse

Citation

BibTeX citation:
@online{krishnamurthy2024,
  author = {Krishnamurthy, Dheepak},
  title = {Power {Lines} with `Ibis`},
  date = {2024-11-30},
  url = {https://kdheepak.com/blog/ibis-power-lines/},
  langid = {en}
}
For attribution, please cite this work as:
D. Krishnamurthy, “Power Lines with `ibis`,” Nov. 30, 2024. https://kdheepak.com/blog/ibis-power-lines/.